Simultaneous error detection at two levels of syntactic annotation
نویسندگان
چکیده
The paper describes a method for measuring compatibility between two levels of manual corpus annotation: shallow and deep. The proposed measures translate into a procedure for finding annotation errors at either level.
منابع مشابه
Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar
This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our metho...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملOn Detecting Errors in Dependency Treebanks
Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an important role in parser evaluation and for the training and evaluation of tools based on depend...
متن کاملDetecting Errors in Discontinuous Structural Annotation
Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in positional annotation (e.g., partof-speech) and continuous structural annotation (e.g., syntactic constituency), no approach has yet been developed for automatically detecting annotation e...
متن کاملAutomatic Error Detection in Annotated Corpora
Annotated corpus is a linguistic resource which explicitly encodes the information at syntactic and semantic levels for each sentence. Annotated corpora play a crucial role in many applications of natural language processing (NLP). Error free and consistent annotated corpora is vital for these applications. Creating annotated corpora is an expensive and time consuming process. Errors or anomali...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012